Here you can download the dataset: Loan Data from Prosper and Explanations to the Dataset
Prosper is a platform where individuals can invest in personal loans or request to borrow money.
Here is a Youtube Video from Fox Business with the CEO of Prosper who explains the Propser System.
Dataset: Read in Dataset and Libraries:
loan <- read.csv('prosperLoanData.csv')
library(ggplot2)
library(gridExtra)
library(RColorBrewer)
Note: The dataset ‘prosperLoanData.csv’ must be in the same folder as this ‘Project_3.Rmd’ file.
In the following, 19 variables of the Dataset are plotted and described.These are divided in numeric and categorial data. The numeric data is also described with a R summary commmand.
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 1000 4000 6500 8337 12000 35000
Note: To avoid Outliers in the plot, the dataset was reduced from 0.1% to 99.9% Quantile.
## Min. 1st Qu. Median Mean 3rd Qu. Max. NA's
## 0.00653 0.15630 0.20980 0.21880 0.28380 0.51230 25
Note: To avoid Outliers in the plot, the dataset was reduced from 0.1% to 99.9% Quantile.
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.0000 0.1340 0.1840 0.1928 0.2500 0.4975
Lender Yield is the Borrower Rate less Service Fees.
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## -0.0100 0.1242 0.1730 0.1827 0.2400 0.4925
Note: To avoid Outliers in the plot, the dataset was reduced from 0% to 99% Quantile.
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 1.00 2.00 44.00 80.48 115.00 1189.00
Note: To avoid Outliers in the plot, the dataset was reduced from 0% to 99% Quantile.
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.0 131.6 217.7 272.5 371.6 2252.0
Note: To avoid Outliers in the plot, the dataset was from 0 to 20000 USD.
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0 3200 4667 5608 6825 1750000
Note: To avoid Outliers in the plot, the dataset was from 0 to 1%.
## Min. 1st Qu. Median Mean 3rd Qu. Max. NA's
## 0.000 0.140 0.220 0.276 0.320 10.010 8554
The average Credit Score is the median of a upper and a lower credit score ranking by the consumer credit rating agency.
#AverageCreditScore
loan$AverageCreditScore <- (loan$CreditScoreRangeLower+
loan$CreditScoreRangeUpper)/2
Note: To avoid Outliers in the plot, the dataset was reduced from 1% to 99% Quantile.
## Min. 1st Qu. Median Mean 3rd Qu. Max. NA's
## 9.5 669.5 689.5 695.1 729.5 889.5 591
#LoanPerInvestor
Investors2 <- subset(loan, Investors > 1)
Investors2$LoanPerInvestor <- Investors2$
LoanOriginalAmount/Investors2$Investors
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 7.299 48.190 70.000 220.200 125.000 12500.000
Explanation of the Debt Coverage Ratio
It’s the relationship between Stated Monthly Income and Monthly Loan Payment.
#Debt Coverage Ratio
loan$DebtCoverageRatio2 = loan$StatedMonthlyIncome/loan$MonthlyLoanPayment
Note: To avoid Outliers in the plot, the dataset was reduced from 0 to 100.
## Min. 1st Qu. Median Mean 3rd Qu. Max. NA's
## 0 13 20 Inf 35 Inf 15
## Warning in `[<-.factor`(`*tmp*`, loan$IsBorrowerHomeowner == FALSE, value
## = structure(c(2L, : invalid factor level, NA generated
## Warning in `[<-.factor`(`*tmp*`, loan$IsBorrowerHomeowner == TRUE, value =
## structure(c(2L, : invalid factor level, NA generated
Average Credit Score
Loan Per Investor
Debt Coverage Ratio
Correlation:
a) BorrowerRate & BorrowerAPR
##
## Pearson's product-moment correlation
##
## data: BorrowerRate and BorrowerAPR
## t = 2347.699, df = 113910, p-value < 2.2e-16
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
## 0.9897057 0.9899409
## sample estimates:
## cor
## 0.989824
b) BorrowerRate & LenderYield
##
## Pearson's product-moment correlation
##
## data: BorrowerRate and LenderYield
## t = 8493.938, df = 113935, p-value < 2.2e-16
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
## 0.9992021 0.9992204
## sample estimates:
## cor
## 0.9992113
c) BorrowerAPR & LenderYield
##
## Pearson's product-moment correlation
##
## data: BorrowerAPR and LenderYield
## t = 2291.732, df = 113910, p-value < 2.2e-16
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
## 0.9892049 0.9894515
## sample estimates:
## cor
## 0.9893289
##
## Pearson's product-moment correlation
##
## data: BorrowerRate and LoanOriginalAmount
## t = -117.5822, df = 113935, p-value < 2.2e-16
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
## -0.3341283 -0.3237719
## sample estimates:
## cor
## -0.3289599
##
## Pearson's product-moment correlation
##
## data: Investors and LoanOriginalAmount
## t = 138.7077, df = 113935, p-value < 2.2e-16
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
## 0.3751140 0.3850494
## sample estimates:
## cor
## 0.3800926
Note: The red dot shows the mean of the plotted feature(s)
The loans around 25000 USD have the most investors but there are also many high loans with only 1 investors.
The value of 38% indicates a small positive correlation between Investors and Loan Original Amount.
In the following, only loans with more than 1 investor is analysed.
##
## Pearson's product-moment correlation
##
## data: Investors and LoanOriginalAmount
## t = 263.0167, df = 86121, p-value < 2.2e-16
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
## 0.6636999 0.6711074
## sample estimates:
## cor
## 0.6674202
##
## Pearson's product-moment correlation
##
## data: LoanOriginalAmount and MonthlyLoanPayment
## t = 867.8179, df = 113935, p-value < 2.2e-16
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
## 0.9312165 0.9327426
## sample estimates:
## cor
## 0.9319837
Note: The red dot shows the mean of the plotted feature(s)
There is a huge spread between a credit score of 500 and 800. Loans over 10000 USD have mostly a credit score over 700. Loans under 10000 USD are mostly be realised with a credit score of 500 and higher.
##
## Pearson's product-moment correlation
##
## data: LoanOriginalAmount and AverageCreditScore
## t = 122.0719, df = 113344, p-value < 2.2e-16
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
## 0.3357190 0.3460095
## sample estimates:
## cor
## 0.3408745
People have mostly higher loans when they have a higher income. But they are also exceptions (see bottom right)
##
## Pearson's product-moment correlation
##
## data: LoanOriginalAmount and StatedMonthlyIncome
## t = 69.3527, df = 113935, p-value < 2.2e-16
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
## 0.1956816 0.2068243
## sample estimates:
## cor
## 0.2012595
Note: The red dot shows the mean of the plotted feature(s)
It seems that the Debt To Income Ratio has no visible effect on the Loan Original Amount.
##
## Pearson's product-moment correlation
##
## data: LoanOriginalAmount and DebtToIncomeRatio
## t = 3.2828, df = 105381, p-value = 0.001028
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
## 0.004074882 0.016148830
## sample estimates:
## cor
## 0.01011222
The most loans between 0 and 25000 USD are created by single invests of 50 to 100 USD.
##
## Pearson's product-moment correlation
##
## data: LoanPerInvestor and LoanOriginalAmount
## t = 35.555, df = 86121, p-value < 2.2e-16
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
## 0.1136895 0.1268536
## sample estimates:
## cor
## 0.1202769
Note: The red dot shows the mean of the plotted feature(s)
Top left shows high loans with low Debt Coverage Ratio while bottom right shows low loans with high Debt Coverage Ratio.
##
## Pearson's product-moment correlation
##
## data: LoanOriginalAmount and DebtCoverageRatio2
## t = -29.1052, df = 113000, p-value < 2.2e-16
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
## -0.09204415 -0.08046991
## sample estimates:
## cor
## -0.08625994
A value of -8% shows a weak negative correlation between these two features.
## Warning in `[<-.factor`(`*tmp*`, loan$IsBorrowerHomeowner == FALSE, value
## = structure(c(1L, : invalid factor level, NA generated
## Warning in `[<-.factor`(`*tmp*`, loan$IsBorrowerHomeowner == TRUE, value =
## structure(c(1L, : invalid factor level, NA generated
Note: The red dot shows the mean of the plotted feature(s)
The highest loans have Attorneys and the lowest have Bus Drivers.
Note: The red dot shows the mean of the plotted feature(s)
Most payments are between 0 and 600 USD and have rates between 5% and 35%.
##
## Pearson's product-moment correlation
##
## data: BorrowerRate and MonthlyLoanPayment
## t = -85.2021, df = 113935, p-value < 2.2e-16
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
## -0.2501933 -0.2392759
## sample estimates:
## cor
## -0.2447424
Note: The red dot shows the mean of the plotted feature(s)
Scores from 500 to 650 seem to have similar rates around 25%. From 650 to 900 the rates fall from 25% to 10%.
##
## Pearson's product-moment correlation
##
## data: BorrowerRate and AverageCreditScore
## t = -175.1695, df = 113344, p-value < 2.2e-16
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
## -0.4661358 -0.4569730
## sample estimates:
## cor
## -0.4615667
The Monthly Income doesn’t seem to have an huge effect to the Borrower Rate.
##
## Pearson's product-moment correlation
##
## data: BorrowerRate and StatedMonthlyIncome
## t = -30.1548, df = 113935, p-value < 2.2e-16
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
## -0.09473938 -0.08321827
## sample estimates:
## cor
## -0.0889818
Note: The red dot shows the mean of the plotted feature(s)
The Debt to Income Ratio doesn’t seem to have a huge effect on the Borrower Rate.
##
## Pearson's product-moment correlation
##
## data: BorrowerRate and DebtToIncomeRatio
## t = 20.4649, df = 105381, p-value < 2.2e-16
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
## 0.05690080 0.06892819
## sample estimates:
## cor
## 0.06291678
It seems that Loan per Investor has no visible effect on the Borrower Rate.
##
## Pearson's product-moment correlation
##
## data: LoanPerInvestor and BorrowerRate
## t = 20.6332, df = 86121, p-value < 2.2e-16
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
## 0.06348697 0.07677859
## sample estimates:
## cor
## 0.07013589
Note: The red dot shows the mean of the plotted feature(s)
All Debt Coverage Ratio seem to be distributed equally between 5% and 35%.
##
## Pearson's product-moment correlation
##
## data: BorrowerRate and DebtCoverageRatio2
## t = 0.4327, df = 113000, p-value = 0.6652
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
## -0.004543397 0.007117585
## sample estimates:
## cor
## 0.001287138
## Warning in `[<-.factor`(`*tmp*`, loan$IsBorrowerHomeowner == FALSE, value
## = structure(c(1L, : invalid factor level, NA generated
## Warning in `[<-.factor`(`*tmp*`, loan$IsBorrowerHomeowner == TRUE, value =
## structure(c(1L, : invalid factor level, NA generated
##
## Pearson's product-moment correlation
##
## data: Investors and MonthlyLoanPayment
## t = 141.8441, df = 113935, p-value < 2.2e-16
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
## 0.3824632 0.3923333
## sample estimates:
## cor
## 0.3874093
##
## Pearson's product-moment correlation
##
## data: Investors and AverageCreditScore
## t = 94.9155, df = 113344, p-value < 2.2e-16
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
## 0.2659485 0.2767345
## sample estimates:
## cor
## 0.27135
##
## Pearson's product-moment correlation
##
## data: MonthlyLoanPayment and AverageCreditScore
## t = 102.9909, df = 113344, p-value < 2.2e-16
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
## 0.2871995 0.2978465
## sample estimates:
## cor
## 0.292532
This plot shows Loan Original Amount vs Monthly Loan Payment by Term. You can see three different lines with the three different terms. Becauses of the almost perfect correlation between Loan Original Amount and Monthly Loan Payment.
Correlations:
Loan Original Amount by Monthly Loan Payment
Term: 12 Months
##
## Pearson's product-moment correlation
##
## data: LoanOriginalAmount and MonthlyLoanPayment
## t = 90.8143, df = 1612, p-value < 2.2e-16
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
## 0.9062526 0.9222398
## sample estimates:
## cor
## 0.914603
Term: 36 Months
##
## Pearson's product-moment correlation
##
## data: LoanOriginalAmount and MonthlyLoanPayment
## t = 1779.007, df = 87776, p-value < 2.2e-16
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
## 0.9862350 0.9865921
## sample estimates:
## cor
## 0.9864147
Term: 60 Months
##
## Pearson's product-moment correlation
##
## data: LoanOriginalAmount and MonthlyLoanPayment
## t = 710.2177, df = 24543, p-value < 2.2e-16
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
## 0.9759372 0.9770983
## sample estimates:
## cor
## 0.9765248
Homeowner (red, top left) seem to have higher loans with lower rates than non-homeowner (green, bottom right). One explanation can be the securities the home offers.
Correlation between Loan Original Amount and Borrower Rate:
Homeowner: yes:
##
## Pearson's product-moment correlation
##
## data: LoanOriginalAmount and BorrowerRate
## t = -82.0419, df = 57476, p-value < 2.2e-16
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
## -0.3310751 -0.3164386
## sample estimates:
## cor
## -0.3237762
Homeowner: no:
##
## Pearson's product-moment correlation
##
## data: LoanOriginalAmount and BorrowerRate
## t = -74.1159, df = 56457, p-value < 2.2e-16
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
## -0.3052753 -0.2902407
## sample estimates:
## cor
## -0.2977764
Non homeowner (left, darkgreen) have fewer investors while homeowner (bottom right, red) have more investors and mostly rates below 20%.
Correlation between Borrower Rate and Investors:
Homeowner: yes
##
## Pearson's product-moment correlation
##
## data: BorrowerRate and Investors
## t = -70.3709, df = 57476, p-value < 2.2e-16
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
## -0.2891552 -0.2741017
## sample estimates:
## cor
## -0.2816457
Homeowner: no
##
## Pearson's product-moment correlation
##
## data: BorrowerRate and Investors
## t = -59.0455, df = 56457, p-value < 2.2e-16
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
## -0.2489195 -0.2333816
## sample estimates:
## cor
## -0.241166
Note: The Year 2005 is missing because they are only 22 obversations in the dataset
In the later years, the Borrower seem to have higher Stated Monthly Incomes (right) while the Loan Original Amount seems similar.
Correlation between Loan Original Amount and Stated Monthly Income:
Years 2006 to 2009:
##
## Pearson's product-moment correlation
##
## data: LoanOriginalAmount and StatedMonthlyIncome
## t = 47.1695, df = 30963, p-value < 2.2e-16
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
## 0.2485014 0.2692846
## sample estimates:
## cor
## 0.258923
Years 2010 to 2014:
##
## Pearson's product-moment correlation
##
## data: LoanOriginalAmount and StatedMonthlyIncome
## t = 53.2303, df = 82948, p-value < 2.2e-16
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
## 0.1751564 0.1883172
## sample estimates:
## cor
## 0.1817449
The Employment Statuses are gathered in full-time employments and no full-time employments for more clearness.
Full time employed people have much higher scores and loans than non full time employed people.
Correlation between Loan Original Amount and Average Credit Score: The Employment Statuses are gathered in full-time employments and no full-time employments for more clearness.
Full-time Employments (Employed, Full-time, Self-employed)
##
## Pearson's product-moment correlation
##
## data: LoanOriginalAmount and AverageCreditScore
## t = 112.863, df = 99809, p-value < 2.2e-16
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
## 0.3309088 0.3419122
## sample estimates:
## cor
## 0.336422
No full-time Employments (Not available, Other, Part-time, Not employed, Retired)
##
## Pearson's product-moment correlation
##
## data: LoanOriginalAmount and AverageCreditScore
## t = 38.2238, df = 13533, p-value < 2.2e-16
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
## 0.2968717 0.3272835
## sample estimates:
## cor
## 0.3121576
You can see a difference between Homeowner and Non-homeowners when you want to compare Loan Original Amount vs Debt Coverage Ratio. The Homeowners (red) get higher loans than Non-Homeowners (green) when you compare similar Debt Coverage Ratios.
Correlation:
Homeowner: yes
##
## Pearson's product-moment correlation
##
## data: LoanOriginalAmount and DebtCoverageRatio2
## t = -24.8585, df = 57099, p-value < 2.2e-16
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
## -0.11157976 -0.09535107
## sample estimates:
## cor
## -0.1034723
Homeowner: no
##
## Pearson's product-moment correlation
##
## data: LoanOriginalAmount and DebtCoverageRatio2
## t = -18.2806, df = 55899, p-value < 2.2e-16
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
## -0.08532444 -0.06884351
## sample estimates:
## cor
## -0.07708924
Note: The Year 2005 is missing because they are only 22 obversations in the dataset
The Loans Per Investor have moved over the time from an intervall of 20 to 60 USD to an intervall of 50 to 100 USD (from left to right in the plots). The Amount seems to be constant.
Correlation between Loan Original Amount and Loan Per Investor:
Years 2006 to 2009:
##
## Pearson's product-moment correlation
##
## data: LoanOriginalAmount and LoanPerInvestor
## t = 9.7733, df = 30963, p-value < 2.2e-16
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
## 0.04434563 0.06655356
## sample estimates:
## cor
## 0.05545645
Years 2010 to 2014:
##
## Pearson's product-moment correlation
##
## data: LoanOriginalAmount and LoanPerInvestor
## t = 188.2829, df = 82948, p-value < 2.2e-16
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
## 0.5424041 0.5519395
## sample estimates:
## cor
## 0.5471896
Note: The Year 2005 is missing because they are only 22 obversations in the dataset
The Borrower Rate seem to stay constant between 5% and 35% while the Loan Per Investor have risen in the later years.
Correlation between Borrower Rate and Loan Per Investor:
Years 2006 to 2009:
##
## Pearson's product-moment correlation
##
## data: BorrowerRate and LoanPerInvestor
## t = -2.633, df = 30963, p-value = 0.008468
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
## -0.026095389 -0.003823944
## sample estimates:
## cor
## -0.01496152
Years 2010 to 2014:
##
## Pearson's product-moment correlation
##
## data: BorrowerRate and LoanPerInvestor
## t = -79.2497, df = 82948, p-value < 2.2e-16
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
## -0.2716199 -0.2589674
## sample estimates:
## cor
## -0.2653051
Most people with the top Occupations have Debt to Income Ratio between 0 and 0.5 except Administrative Assistants and Bus Drivers. These people have also lower Loan Amounts than the others.
Correlation between Loan Original Amount and Debt To Income Ratio: I divided the 10 Occupations into high and low Income Jobs.
High Income Jobs (Analyst, Accountant/CPA, Attorney)
##
## Pearson's product-moment correlation
##
## data: LoanOriginalAmount and DebtToIncomeRatio
## t = 5.3086, df = 7431, p-value = 1.137e-07
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
## 0.03878575 0.08408237
## sample estimates:
## cor
## 0.06146571
Low Income Jobs (Administrative Assistant, Civil Service, Bus Driver, Architect, Car Dealer, Chemist, Biologist)
##
## Pearson's product-moment correlation
##
## data: LoanOriginalAmount and DebtToIncomeRatio
## t = 1.6699, df = 5869, p-value = 0.09499
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
## -0.003790085 0.047346546
## sample estimates:
## cor
## 0.02179248
The Borrower Rate doesn’t seem to an visible effect on the different loan categories because all have loans between 5% and 35%. But you can see that people who use their loans for Debt Consolidation, Business and Home Improvements have the highest monthly Incomes.
Correlation between Borrower Rate and Stated Monthly Income: I divided the 10 Loan Categories into two 5-Categories correlations.
First 5 Loan Categories (Debt Consolidation, NA, Other, Home Improvement, Business)
##
## Pearson's product-moment correlation
##
## data: BorrowerRate and StatedMonthlyIncome
## t = -27.2829, df = 100387, p-value < 2.2e-16
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
## -0.09192925 -0.07964842
## sample estimates:
## cor
## -0.08579209
Last 5 Loan Categories (Auto, Personal Loan, Student Use, Baby & Adoption, Boat)
##
## Pearson's product-moment correlation
##
## data: BorrowerRate and StatedMonthlyIncome
## t = -10.3106, df = 6005, p-value < 2.2e-16
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
## -0.1566581 -0.1069589
## sample estimates:
## cor
## -0.1318914
Note: The Year 2005 is missing because they are only 22 obversations in the dataset
You can see that the Debt Coverage Ratio changed over time. In the early years 2005-2009 (bottom left) the amounts are middle and the coverage Ratio is low. There is also high Debt Coverage Ratios and low amounts (bottom right). In the later years 2010-2014 there are much more loans with higher amounts and lower Debt to Coverage Ratios top left). You can see more people are trusting the Prosper Platform with higher amounts and they are more trustworthy because of the higher ratios.
Correlation between Loan Original Amount and Debt to Coverage Ratio:
Years 2006 to 2009:
##
## Pearson's product-moment correlation
##
## data: LoanOriginalAmount and DebtCoverageRatio2
## t = -14.3583, df = 30493, p-value < 2.2e-16
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
## -0.09308634 -0.07078963
## sample estimates:
## cor
## -0.08194824
Years 2010 to 2014:
##
## Pearson's product-moment correlation
##
## data: LoanOriginalAmount and DebtCoverageRatio2
## t = -26.8548, df = 82485, p-value < 2.2e-16
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
## -0.09985947 -0.08632921
## sample estimates:
## cor
## -0.09309864
In these plots you can see how different occupations have different loans amounts guarantueed by their incomes. There are three groups. Group 1 are top loans with top incomes like Analysts, Accountants and Attorneys. Group 2 have top loans with lower incomes like Civil Service, Car Dealer and Chemists. Group 3 have low loans and low incomes like Administrative Assistants, Bus Drivers, Architects and Biologists. Group 1 is the most preferred group and that’s why they get high loans. Group 2 has also a good standing because they get high loans with lower incomes. But group 3 isn’t that interesting for loan givers. The result is they get lower loans than the other groups.
Correlation between Loan Original Amount and Stated Monthly Income: I divided the 10 Occupations into high and low Income Jobs.
High Income Jobs (Analyst, Accountant/CPA, Attorney)
##
## Pearson's product-moment correlation
##
## data: LoanOriginalAmount and StatedMonthlyIncome
## t = 26.3029, df = 7879, p-value < 2.2e-16
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
## 0.2636895 0.3042837
## sample estimates:
## cor
## 0.2841139
Low Income Jobs (Administrative Assistant, Civil Service, Bus Driver, Architect, Car Dealer, Chemist, Biologist)
##
## Pearson's product-moment correlation
##
## data: LoanOriginalAmount and StatedMonthlyIncome
## t = 31.8228, df = 6122, p-value < 2.2e-16
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
## 0.3550514 0.3980380
## sample estimates:
## cor
## 0.3767475
People with high incomes (right in the plots) are investing their loans mostly in Debt Consolidation, Home Improvements and Business. People with low income s (left in the plots) are mostly investing in the above-mentioned categories and also in Student Use, Baby & Adoption, Auto and Boats.
Correlation between Loan Original Amount and Stated Monthly Income: I divided the 10 Loan Categories into two 5-Categories correlations.
First 5 Loan Categories (Debt Consolidation, NA, Other, Home Improvement, Business)
##
## Pearson's product-moment correlation
##
## data: LoanOriginalAmount and StatedMonthlyIncome
## t = 61.7547, df = 100387, p-value < 2.2e-16
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
## 0.1853423 0.1972614
## sample estimates:
## cor
## 0.1913089
Last 5 Loan Categories (Auto, Personal Loan, Student Use, Baby & Adoption, Boat)
##
## Pearson's product-moment correlation
##
## data: LoanOriginalAmount and StatedMonthlyIncome
## t = 28.6632, df = 6005, p-value < 2.2e-16
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
## 0.3244730 0.3689677
## sample estimates:
## cor
## 0.3469155